Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties

نویسندگان

Pablo Gamallo

Iñaki Alegria

José Ramom Pichel Campos

Manex Agirrezabal

چکیده

This article describes the systems submitted by the Citius Ixa Imaxin team to the Discriminating Similar Languages Shared Task 2016. The systems are based on two different strategies: classification with ranked dictionaries and Naive Bayes classifiers. The results of the evaluation show that ranking dictionaries are more sound and stable across different domains while basic bayesian models perform reasonably well on in-domain datasets, but their performance drops when they are applied on out-of-domain texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Discriminating Similar Languages: Evaluations and Explorations

We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties. We carried out a number of experiments using the results of the two editions of the Discriminating between Similar Languages (DSL) shared task. We investigate the progress made between the two tasks, estimate an upper bound on possible performance using e...

متن کامل

Comparing Approaches to the Identification of Similar Languages

This paper describes the submission made by the MMS team to the Discriminating between Similar Languages (DSL) shared task 2015. We participated in the closed submission track using only the dataset provided by the shared task organisers which contained short texts from 13 similar languages and language varieties. We submitted three runs using different systems and compare their performance. As...

متن کامل

Exploring Methods and Resources for Discriminating Similar Languages

The Discriminating between Similar Languages (DSL) shared task at VarDial challenged participants to build an automatic language identification system to discriminate between 13 languages in 6 groups of highly-similar languages (or national varieties of the same language). In this paper, we describe the submissions made by team UniMelb-NLP, which took part in both the closed and open categories...

متن کامل

Distributed Representations of Words and Documents for Discriminating Similar Languages

Discriminating between similar languages or language varieties aims to detect lexical and semantic variations in order to classify these varieties of languages. In this work we describe the system built by the Pattern Recognition and Human Language Technology (PRHLT) research center Universitat Politècnica de València and Autoritas Consulting for the Discriminating between similar languages (DS...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties

نویسندگان

چکیده

منابع مشابه

Comparing k-means clusters on parallel Persian-English corpus

Discriminating Similar Languages: Evaluations and Explorations

Comparing Approaches to the Identification of Similar Languages

Exploring Methods and Resources for Discriminating Similar Languages

Distributed Representations of Words and Documents for Discriminating Similar Languages

عنوان ژورنال:

اشتراک گذاری